Skip to content

[None][fix] Honor Qwen Image quant ignore list#15599

Open
pst2154 wants to merge 1 commit into
NVIDIA:mainfrom
pst2154:codex/qwen-image-quant-ignore
Open

[None][fix] Honor Qwen Image quant ignore list#15599
pst2154 wants to merge 1 commit into
NVIDIA:mainfrom
pst2154:codex/qwen-image-quant-ignore

Conversation

@pst2154

@pst2154 pst2154 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Description

Fix Qwen-Image dynamic quantization so quant_config.ignore is applied to the module graph, matching the behavior used by other VisualGen transformers such as WAN and FLUX.

The Qwen-Image dynamic weight loader already skipped load-time quantization for ignored module names, but the Linear modules were still constructed with the global quant config. That meant excluded modules could still retain NVFP4/FP8 module state and activation behavior, making selective quantization appear ineffective.

This change adds an exclusion pass in QwenImageTransformer2DModel that replaces ignored Linear.quant_config values with a no-op weight quant config, while preserving any KV cache quant setting. It also adds a unit test that verifies ignored Qwen modules are left unquantized while non-ignored modules keep NVFP4.

Testing

  • python -m py_compile tensorrt_llm/_torch/visual_gen/models/qwen_image/transformer_qwen_image.py tests/unittest/_torch/visual_gen/test_qwen_image_registry.py
  • git diff --check

Could not run targeted pytest locally because tests/unittest/conftest.py imports mpi4py, which is not installed in this environment.

Summary by CodeRabbit

  • New Features
    • Qwen Image models now respect quantization exclusion settings, allowing selected layers to stay unquantized while the rest of the model still uses the configured quantization behavior.
    • Improved initialization so excluded components are handled automatically at model load time.

@chang-l chang-l requested a review from yibinl-nvidia June 24, 2026 19:02
@pst2154 pst2154 force-pushed the codex/qwen-image-quant-ignore branch from d846c5d to c94b440 Compare June 24, 2026 21:22
@pst2154 pst2154 marked this pull request as ready for review June 24, 2026 21:39
@pst2154 pst2154 requested a review from a team as a code owner June 24, 2026 21:39
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

QwenImageTransformer2DModel now applies per-module quantization exclusions during initialization by replacing excluded Linear submodules' quant configs with a no-quantization config that preserves the KV-cache quantization algorithm. A unit test now checks excluded modules lose quantization while non-excluded modules retain NVFP4.

Changes

Qwen-Image transformer quantization exclusions

Layer / File(s) Summary
Initialization hook and exclusion helper
tensorrt_llm/_torch/visual_gen/models/qwen_image/transformer_qwen_image.py
Adds QuantConfig import, calls the exclusion helper from __init__, and defines logic that rewrites excluded Linear submodules' quant_config values.
Registry test for excluded submodules
tests/unittest/_torch/visual_gen/test_qwen_image_registry.py
Adds a unit test that builds an NVFP4 dynamic quantization config with an exclude list and checks excluded modules disable quantization while others keep it.

Sequence Diagram(s)

sequenceDiagram
  participant QwenImageTransformer2DModel
  participant model_config_quant_config as model_config.quant_config
  participant Linear
  participant QuantConfig
  QwenImageTransformer2DModel->>model_config_quant_config: read quant_config.exclude_modules
  QwenImageTransformer2DModel->>QwenImageTransformer2DModel: call apply_quant_config_exclude_modules()
  QwenImageTransformer2DModel->>Linear: inspect named_modules() for excluded submodules
  QwenImageTransformer2DModel->>QuantConfig: build a no-quantization config with the KV-cache algorithm
  QwenImageTransformer2DModel->>Linear: replace quant_config on excluded Linear modules
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title follows the required [None][fix] format and clearly summarizes the Qwen Image quant ignore-list fix.
Description check ✅ Passed The description covers the bug, fix, and testing, and is mostly complete despite missing the explicit PR Checklist and Test Coverage headings.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/visual_gen/models/qwen_image/transformer_qwen_image.py (1)

829-891: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

Apply exclusions before Linear.create_weights() consumes the quant config.

Linear.__init__ creates weights immediately unless skip_create_weights_in_init=True, and create_weights() caches quant_method from the original NVFP4 config. Mutating only module.quant_config afterward can leave excluded modules with an NVFP4 quant_method and quantized weight layout, so forward/load/post-load paths can still behave quantized despite quant_algo is None. Move this exclusion before weight creation, or rebuild/reset the effective quant_method and weights for already-created Linear modules.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/models/qwen_image/transformer_qwen_image.py`
around lines 829 - 891, Apply the quantization exclusions before
Linear.create_weights() locks in the original quant_method and weight layout.
Update QwenImageTransformer2DModel.apply_quant_config_exclude_modules so
excluded Linear modules are handled before or during construction/weight
creation, or explicitly reset their effective quant_config, quant_method, and
weights after mutation; otherwise excluded modules may still behave as
NVFP4-quantized even when quant_config.quant_algo is None.
🧹 Nitpick comments (1)
tests/unittest/_torch/visual_gen/test_qwen_image_registry.py (1)

70-70: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add the return annotation for the new test.

The Python guidelines require all functions to be annotated.

Proposed fix
-def test_transformer_applies_quant_config_ignore_list():
+def test_transformer_applies_quant_config_ignore_list() -> None:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/_torch/visual_gen/test_qwen_image_registry.py` at line 70, The
new test function test_transformer_applies_quant_config_ignore_list is missing a
return annotation, and the Python guidelines require every function to be
annotated. Update the test definition to include the appropriate return type
annotation for this test in test_qwen_image_registry.py, keeping the rest of the
test logic unchanged.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unittest/_torch/visual_gen/test_qwen_image_registry.py`:
- Around line 86-92: The current assertions only verify quant_config and can
miss cases where Linear.create_weights() has already cached a different
quant_method. Update the qwen image registry test to assert the effective
quantization behavior on the relevant modules, using symbols like
Linear.create_weights, quant_method, txt_in, proj_out, and transformer_blocks:
confirm excluded modules keep the unquantized quant_method while non-excluded
modules still resolve to NVFP4. Keep the coverage focused on the TensorRT-LLM
effective path, not just the config field.

---

Outside diff comments:
In `@tensorrt_llm/_torch/visual_gen/models/qwen_image/transformer_qwen_image.py`:
- Around line 829-891: Apply the quantization exclusions before
Linear.create_weights() locks in the original quant_method and weight layout.
Update QwenImageTransformer2DModel.apply_quant_config_exclude_modules so
excluded Linear modules are handled before or during construction/weight
creation, or explicitly reset their effective quant_config, quant_method, and
weights after mutation; otherwise excluded modules may still behave as
NVFP4-quantized even when quant_config.quant_algo is None.

---

Nitpick comments:
In `@tests/unittest/_torch/visual_gen/test_qwen_image_registry.py`:
- Line 70: The new test function
test_transformer_applies_quant_config_ignore_list is missing a return
annotation, and the Python guidelines require every function to be annotated.
Update the test definition to include the appropriate return type annotation for
this test in test_qwen_image_registry.py, keeping the rest of the test logic
unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cf7ce349-5021-44bc-ad2e-d60309717bea

📥 Commits

Reviewing files that changed from the base of the PR and between 7193f41 and c94b440.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/visual_gen/models/qwen_image/transformer_qwen_image.py
  • tests/unittest/_torch/visual_gen/test_qwen_image_registry.py

Comment thread tests/unittest/_torch/visual_gen/test_qwen_image_registry.py
@chang-l

chang-l commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55626 [ run ] triggered by Bot. Commit: c94b440 Link to invocation

model.load_weights({})


def test_transformer_applies_quant_config_ignore_list():

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an E2E test might be more beneficial here. Could you add another E2E test with quant ignore list to tests/integration/defs/examples/visual_gen/test_visual_gen.py and make sure output can be generated?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 28d5424: test_qwen_image_example_with_quant_ignore in tests/integration/defs/examples/visual_gen/test_visual_gen.py. It writes a Qwen-Image dynamic FP8 config with an ignore list, runs examples/visual_gen/models/qwen_image.py, and asserts the PNG output is generated.

Signed-off-by: Alex Steiner <asteiner@nvidia.com>
@pst2154 pst2154 force-pushed the codex/qwen-image-quant-ignore branch from c94b440 to 28d5424 Compare June 25, 2026 01:47
@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55626 [ run ] completed with state FAILURE. Commit: c94b440
/LLM/main/L0_MergeRequest_PR pipeline #44541 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chang-l

chang-l commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

@pst2154 CI is currently blocked on the Pre-commit Check (GitHub Actions) — not a test or infra failure. The ruff hook auto-fixes a formatting issue but pre-commit fails whenever a hook modifies a file:

ruff .... Failed  (Found 1 error, 1 fixed, 0 remaining; files were modified by this hook)

It's a single missing blank line after the imports in tests/unittest/_torch/visual_gen/test_qwen_image_registry.py:

@@ -14,6 +14,7 @@ import pytest
 from tensorrt_llm._torch.modules.linear import NVFP4LinearMethod, UnquantizedLinearMethod
+
 # Importing the models package side-effects the ``@register_pipeline``

To unblock, run pre-commit locally, then re-stage + re-push:

pre-commit run --files tests/unittest/_torch/visual_gen/test_qwen_image_registry.py
git add tests/unittest/_torch/visual_gen/test_qwen_image_registry.py
git commit -s   # (hook already applied the fix; just re-stage & commit)
git push

Once Pre-commit Check is green, the full /bot run pipeline can proceed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants